Picture for Matthias Gallé

Matthias Gallé

A Unified Framework for Rethinking Policy Divergence Measures in GRPO

Add code
Feb 05, 2026
Viaarxiv icon

The Multilingual Divide and Its Impact on Global AI Safety

Add code
May 27, 2025
Viaarxiv icon

Aya Vision: Advancing the Frontier of Multilingual Multimodality

Add code
May 13, 2025
Viaarxiv icon

Command A: An Enterprise-Ready Large Language Model

Add code
Apr 01, 2025
Figure 1 for Command A: An Enterprise-Ready Large Language Model
Figure 2 for Command A: An Enterprise-Ready Large Language Model
Figure 3 for Command A: An Enterprise-Ready Large Language Model
Figure 4 for Command A: An Enterprise-Ready Large Language Model
Viaarxiv icon

If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs

Add code
Dec 05, 2024
Figure 1 for If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs
Figure 2 for If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs
Figure 3 for If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs
Figure 4 for If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs
Viaarxiv icon

Commit0: Library Generation from Scratch

Add code
Dec 02, 2024
Viaarxiv icon

On Leakage of Code Generation Evaluation Datasets

Add code
Jul 11, 2024
Figure 1 for On Leakage of Code Generation Evaluation Datasets
Figure 2 for On Leakage of Code Generation Evaluation Datasets
Figure 3 for On Leakage of Code Generation Evaluation Datasets
Figure 4 for On Leakage of Code Generation Evaluation Datasets
Viaarxiv icon

Improving Reward Models with Synthetic Critiques

Add code
May 31, 2024
Figure 1 for Improving Reward Models with Synthetic Critiques
Figure 2 for Improving Reward Models with Synthetic Critiques
Figure 3 for Improving Reward Models with Synthetic Critiques
Figure 4 for Improving Reward Models with Synthetic Critiques
Viaarxiv icon

LLMCRIT: Teaching Large Language Models to Use Criteria

Add code
Mar 02, 2024
Viaarxiv icon

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Add code
Feb 26, 2024
Viaarxiv icon